Tag
15 articles
This article explains NVIDIA's X-Token, a novel knowledge distillation technique that improves the performance of smaller language models by addressing token misalignment issues in previous methods like GOLD. It details how projection-guided cross-tokenizer alignment enhances model compression and deployment efficiency.
As AI transforms the workplace, professionals are mastering skills that allow them to become 'AI native'—fluently integrating and optimizing artificial intelligence into their workflows.
Poetiq's new meta-system automatically builds a model-agnostic inference harness that improves performance across multiple LLMs without fine-tuning.
Learn how speculative decoding helps AI systems generate text faster without losing accuracy, using a fast guess-and-check method.
The Qwen team has released FlashQLA, a high-performance linear attention kernel library that achieves up to 3x speedup on NVIDIA Hopper GPUs, enhancing both pretraining and edge-side inference.
LoRA, a widely used technique for fine-tuning large language models, assumes all updates are similar — a premise that fails in real-world production environments. This limitation is now prompting a reevaluation of its effectiveness in complex, diverse applications.
This explainer explores how AI model optimization techniques have made older smartphones more efficient than newer models, challenging the assumption that newer is always better.
Learn how TriAttention, a new AI method, compresses memory in large language models to make them 2.5x faster without losing accuracy.
This explainer explores Google's TurboQuant technology, a real-time quantization approach that reduces AI computational costs and enables local deployment of large models.
This article explains how AI-driven operating system optimization works, examining the machine learning techniques and system architecture changes that enable Windows 11 to adapt dynamically to user behavior and performance requirements.
This article explains hyperagents, advanced AI systems that can improve both their task performance and their own learning mechanisms. It explores how these self-improving systems work and why they represent a significant advancement in artificial intelligence.
This explainer explores how AI-powered desktop virtualization works, combining containerization with machine learning to create snappy, portable desktop environments that feel native to users.